The 2002 Perl Advent Calendar
[about] | [news] | [rss] | [mirrors] | [links]
[2000] | [2001] | [2002] | [2003]

File::Find::Rule

One of the core modules distributed with perl, File::Find, allows you to find files by recursively searching though directory paths on your hard drive. This is good example of when to use a module, as although the task may sound simple to write yourself it's possible to get into all kinds of trouble in some special cases.

The trouble with File::Find is that it's quite hard to use, and not something beginners can easily get to grips with. It uses a callback interface similar to that I described for URI::Find. As well as being confusing this isn't the most convenient interface to use most of the time.

Enter File::Find::Rule a module that does nothing but provide a new, simpler, interface for File::Find.

[Read the documentation for File::Find::Rule on search.cpan.org]

So, I was pondering to myself, if I actually buy myself an mp3 player, then what mp3s have I got on my laptop right now that I could install onto it?

  # lookup all the files below /home/mark/mp3
  @files = File::Find::Rule->file
                           ->in("/home/mark/mp3");

This populates @files with a list of the countless mp3s I carry around with me on my laptops. They're all fully qualified paths like so:

  '/home/mark/mp3/madness/It_Must_Be_Love.mp3'

If I'd specified a relative path to File::Find::Rule. I'd have got a relative path back again. For example this code:

  # change to my home dir
  chdir("/home/mark");
  # find all the files in the 'mp3' dir in there
  @files = File::Find::Rule->file()
                           ->in("mp3");

populates @files with a list of mp3s that look like:

  'mp3/madness/It_Must_Be_Love.mp3'

You can easily convert between relative paths and absolute paths whenever you need to by using the File::Spec module.

  # use the functional form of File::Spec where it'll export
  # 'abs2rel' and 'rel2abs' into our namespace.
  use File::Spec::Functions qw(:ALL);
  # convert the absolute path to one relative to "/home/mark"
  print abs2rel("/home/mark/mp3/madness/It_Must_Be_Love.mp3",
                "/home/mark") . "\n";
  # convert the relative path to an absolute, assuming it
  # starts from "/home/mark"
  print rel2abs("mp3/madness/It_Must_Be_Love.mp3",
                "/home/mark") . "\n";

Omitting the second parameter (the "/home/mark") will cause File::Spec to just use the current working directory as it's base - probably what we wanted anyway.

Back in our situation, I've suddenly realised that when I rip my music from CDs rather than downloading it from the web, I use the ogg format which I store in my mp3 dir as they're pretty much the same thing. However, since the mp3 player I'm looking at doesn't yet support oggs I'm not interested in those (oops, I sense much re-encoding in my future.) How do I just find the mp3 files?

  my @files = File::Find::Rule->file
                              ->name('*.mp3')
                              ->in("/home/mark/mp3");

So you can see we're chaining rules together. First we say that the file must be a file (we could have used the directory method to get a list of directories back.) You can also see that the name method takes a standard unix file glob - you can use a standard perl regular expression in it's place if you want, by using the qr operator.

  my @files = File::Find::Rule->file
                              ->name( qr{\.mp3$} )
                              ->in("/home/mark/mp3");

I get another thought. What about all the mp3s of sound effects I've downloaded? Better not count any of them, so better disregard all files smaller than two hundred kilobytes.

    my @files = File::Find::Rule->file
                                ->name('*.mp3')
                                ->size(">=200K")
                                ->in("/home/mark/mp3");

And all the music I downloaded in the last week may or may not be any good, so we'd better not count that either.

    my $last_week = time()-(7*24*60*60);
    my @files = File::Find::Rule->file
                                ->name('*.mp3')
                                ->size(">=200K")
                                ->mtime("<$last_week")
                                ->in("/home/mark/mp3");

Combining and negating rules

You can set up negative rules with the not clause. You simply need create another rule that hasn't been executed with an in clause.

   my $backup = File::Find::Rule->file
                                ->name("*~","*.bak","#*#");
   # find large documents
   my @files = File::Find::Rule->file
                               ->size(">30K")
                               ->not( $backup )
                               ->in("/home/mark/docs");

Rules that haven't been executed with in can be happily combined. For example, finding files that are bigger than they should be:

   my $mp3 = File::Find::Rule->file
                             ->named('*.mp3')
                             ->size(">4MB");
   my $jpg = File::Find::Rule->file
                             ->named('*.')
                             ->size(">350KB");
   my @files = File::Find::Rule->or($mp3, $jpg)
                               ->in("/home/mark");

As this or is a kind of lazy evaluation it can be used to help your code not search in particular directories. As way of an example consider the subversion version control system, and how keeps a 'backup' copy of many files in your current directory in a directory inside it called .svn. Say we want to find all of the .pm files in a directory, but don't want to find those pesky backup files:

    # look for '.svn' and fail
    my $svn = File::Find::Rule->directory
                              ->name(".svn")
                              ->prune          # don't go into it
                              ->discard;       # don't report it
    my $pm = File::Find::Rule->file
                             ->name("*.pm");
    my @files = File::Find::Rule->or( $svn, $pm )
                                ->in("/home/mark/svn/advent/code");

As the $svn rule is checked first (it's the first statement in the or) it gets to decide that the rule should both not search inside the .svn directories (the prune command) and that the other rule in the or should not even be consulted (the discard command) about if the file can pass the rule.

File::Find::Rule Extension Modules

File::Find::Rule has numerous extension modules. One such instance is the File::Find::Rule::MMagic module that provides an interface for checking the mime type of a file. For example, with this I can ignore spurious data (in my case normally oggs) that have been accidentally named with an mp3 extension:

    use File::Find::Rule::MMagic;
    my @files = File::Find::Rule->file
                                ->name("*.mp3")
	                        ->magic('audio/mpeg')
                                ->in("/home/mark/mp3");

I can use the File::Find::Rule::MP3Info to look for tracks that are by a particular artist:

    use File::Find::Rule::MMagic;
    use File::Find::Rule::MP3Info;
    my @files = File::Find::Rule->file
                                ->name("*.mp3")
	                        ->magic('audio/mpeg')
                                ->mp3info( ARTIST => "Green Day")
                                ->in("/home/mark/mp3");

Note how I can load more than one extension module and they 'stack' - I get the ability to use rules from either module.

Legal Note

All mp3 files mentioned in this tutorial downloaded legally though licensed agents of the copyright holders. All ogg files mention in this tutorial extracted from my personal CD collection for my own personal use only.



Copyright 2000-2003 Mark Fowler, all rights reserved.
This documentation may be distributed under the Academic Free License
Comments/Complaints/Suggestions re this site: webmaster